Single-cell genomics reveals that dermal algae lacking plastids are close relatives of red algae

2021-11-18 11:33:04 By : Ms. Emily Zhou

Thank you for visiting Nature. The browser version you are using has limited support for CSS. For the best experience, we recommend that you use a newer version of the browser (or turn off the compatibility mode in Internet Explorer). At the same time, to ensure continued support, we will display sites without styles and JavaScript.

Nature Communications Volume 12, Article Number: 6651 (2021) Cite this article

The endosymbiotic origin of plastids from cyanobacteria endows eukaryotes with photosynthesis capacity and initiates the diversification of countless forms of algae. These primary plastids exist in the members of the eukaryotic supergroup Archaeplastida. All known primitive plastids still retain some form of primary plastids, and it is generally believed that they have a single source. Here, we use single-cell genomics from natural samples combined with systematic genomics to infer the evolutionary origin of Picozoa, a globally distributed but seemingly rare heterotrophic eukaryotic group of marine microorganisms. Strikingly, the analysis of 43 single-cell genomes showed that Picozoa belongs to Archaeplastida, especially related to red algae and phagocytic red algae. These dermatological animal genomes support the hypothesis that dermatological animals lack plastids, and further reveal that there is no evidence of secret endosymbiosis with cyanobacteria in the early stage. These findings change our understanding of plastid evolution because they either represent the first complete loss of plastids in the free-living taxa, or indicate that red algae and red algae obtained plastids independently of other ancient plastids.

Through the endosymbiosis between the eukaryotic host and the cyanobacteria, the origin of the plastids is a fundamental shift in eukaryotic evolution, resulting in the first photosynthetic eukaryote. These ancient protoplasts, estimated to have originated 1.8 billion years ago1, are found in the phylum Rhodophyta (red algae), chloroplasts (green algae, including terrestrial plants), and cyanophyta (cyanophyta), forming a eukaryotic supergroup. Archaeplastida2. Unraveling the sequence of events that led to the establishment of cyanobacterial endosymbiosis in paleoplasts is complicated by modern descendants of early differentiated relatives of major paleoplast groups that are absent in ancient and current culture collections or sequence databases. In fact, the only other known example of native endosymbiosis is the pigment cell of an unrelated amoeba (Paulinella), which originated about 1 billion years later3,4. Recently, two newly described phylums (Prasinodermophyta and Rhodelphidia) were discovered as sister branches of green algae and red algae, respectively5,6. The most revolutionary thing is the discovery that red algae are obligate phagocytic organisms. They maintain mysterious non-photosynthetic plastids, which means that the ancestors of red algae may be mixed nutrition. This discovery has greatly changed our understanding of the early ancient nature. Views on body evolution 5.

Although there is a lot of evidence that Archaeplastida is a descendant of a photosynthetic ancestor, it has been found that non-photosynthetic and plastid-deficient lineages branch near the base of the phylogenetic tree or even within the original plastids. For example, Cryptista (including species lacking plastids and containing secondary plastids) is inferred to be the sister of green algae and cyanobacteria 7 or red algae 5, 8, although other phylogenetic analyses have restored the Archaeplastida monophylete, excluding the code Scientists 5,9,10. Another non-photosynthetic group that has recently been shown to be related to red algae based on phylogenetics is Picozoa5,9,10. But for cryptographers, Picozoa’s position lacks unanimous support, mainly because there are no Picozoa members available in continuous culture, and genomic data is currently limited to a few, incomplete single amplified genomes (SAG)11. Therefore, the origin of Picozoa is still unclear.

Picozoa (previously known as picobiliphytes) was first described in the marine environment clone library of 18S ribosomal RNA (rRNA) genes in 2007 and was observed by fluorescence microscopy in temperate waters. Based on the orange autofluorescence, which is reminiscent of the photosynthetic pigment phycobiliprotein and emitted from organelle-like structures, micro-animals were initially described as possibly containing plastids. In subtropical waters, orange fluorescence associated with these uncultured cells was also observed. However, the characterization of SAG data of three dermatological animal cells isolated by fluorescence activated cell sorting (FACS)11 challenged the hypothesis that the cells have photosynthesis. The analysis of these SAGs showed neither plastid DNA nor nuclear-encoded plastid targeting proteins, but due to the small number of cells analyzed and the highly fragmented and incomplete data obtained, the scope of these conclusions is limited. The most interesting thing is that a transient culture was later established to formally describe the first (and so far only) picozoan species—Picomonas judraskeda—and ultrastructural observations using electron microscopy14. These observations revealed unusual structural features of two body parts, a feeding strategy through endocytosis of nanocolloid particles, and confirmed the absence of plastids. Only the 18S rRNA gene sequence P. judraskeda is available because the transient culture was lost before the genome data was generated.

Here, we analyzed the genome data of 43 dermatological single-cell genomes from the Pacific Ocean and Baltic Sea off the coast of California. Using systematic genome data sets rich in genes and taxa, these data enable us to strongly infer that Picozoa is a lineage of primitive plastids, with red algae and red algae as branches. With this expanded genome data set, we confirm that Picozoa is the first primitive plastid lineage lacking plastids. We discussed the significance of these results for our understanding of the origin of plastids.

We used FACS to isolate 43 dermatome cells (40 from the eastern North Pacific off the coast of California and 3 from the Baltic Sea) and performed whole-genome amplification by multiple displacement amplification (MDA). The classification and affiliation of SAG is determined by PCR and Picozoa-specific primers 14 or 18S rRNA gene sequencing using universal eukaryotic primers, followed by Illumina sequencing of MDA products (see "Methods"). Sequencing reads are assembled into genomic contigs, with a total assembly size ranging from 350 kbp to 66 Mbp (Figure 1a and Supplementary Data 1). From these contigs, 18S rRNA genes were found in 37 of 43 SAGs, and we used it to construct a phylogenetic tree containing reference sequences from the PR2 database of protist ribosomes (Supplementary Figure 1). Based on this tree, we identified 6 groups representing 32 SAGs, and each group has almost the same 18S rRNA gene sequence. These SAGs with the same ribotype are reassembled by pooling all the readings to obtain a longer and more complete co-assembly (CO-SAG). The genome size of CO-SAG ranges from 32 to 109 Mbp (Figure 1a and Supplementary Data 1), which is an increase of 5-45% compared to a single SAG. The genome integrity of SAGs and CO-SAGs is estimated based on two data sets: (i) a set of 255 eukaryotic marker genes available in BUSCO15, and (ii) a set of 317 conserved marker genes. These marker genes From the previous pan-eukaryotic system genomics, dataset1 we use here as the starting point for downstream analysis (Figure 1b). These comparisons show that while most SAGs are highly incomplete (Figure 1a, b), CO-SAG is generally more complete (up to 60%). Taken together, 90% of BUSCO markers and 88% of system genome markers are present in at least one assembly, which indicates that although the single-cell genome assembly is fragmentary, they together represent a more complete Picozoa meta-assembly.

a The assembled length (in Mbp) of 17 SAG and CO-SAG for further analysis. The source data is provided in Supplementary Data 1. b BUSCO data set with 255 eukaryotic markers and 317 systematic genome marker genes are used. These ten components are used for phylogenetic inference. The boxes show the minimum and maximum values ​​(excluding outliers), the first and third quartiles, and the median. The source data is provided in the source data file. c The maximum likelihood tree of the 18S rRNA gene, reconstructed using the model GTR R4 F, and using 100 non-parametric guided replication estimation support in IQ-TREE. Picozoa CO-SAGs and SAGs are written in bold, and the sequence of Picomonas judraskeda and the sequence of SAGs from Yoon et al. are written in bold italics. 11. The group label "BP1-3" is taken from Cuvelier et al. 13, and the "deep branch" lineage is from Moreira and López-Garcia16.

The last 17 components (11 SAG and 6 CO-SAG) are mainly placed in the three proposed groups of Picozoa BP1-3 (Figure 1c), sensu Cuvelier et al.13, but SAG11 is placed outside these groups. Moreira and López-García16 identified the deep-branched dermatophyte lineage, and other lineages that may be differentiated early are not shown in our data (Figure 1c). Interestingly, a CO-SAG (COSAG03) is closely related to the only described species, Picomonas judraskeda, but there is no genomic data available (18S rRNA genes are 100% identical). Using our assembly and the reference sequence from PR2 as a query, we identified 362 OTUs related to Picozoa (≥90%) in the data provided by the Tara Oceans project through sequence identity. Skin animals have been found in all major marine areas, but the V9 18S rRNA gene amplicon data usually have a low relative abundance (in most cases less than 1% of the eukaryotic part, Supplementary Figure 2). One exception is the Southern Ocean between South America and Antarctica, where OTUs associated with dermatologists in one sample accounted for 30% of the V9 18S rRNA gene amplicons. As a result, Picozoa seems to be widespread in the ocean, but based on available sampling, their abundance is generally lower, although they can at least reach higher relative abundances in polar waters.

In order to infer the evolutionary origin of Picozoa, we have extended a phylogenetic data set that contains a wide range of eukaryotic samples and a large number of genes that have recently been used to study deep nodes in eukaryotic trees1. Homologs from SAGs and CO-SAGs and many newly sequenced key eukaryotes were added to each individual gene (see Supplementary Table 1 for taxa list). After carefully examining the contamination and orthology of individual genes based on the ontogeny (see "Methods"), we retained all six CO-SAGs and four individual SAGs, as well as the SAG MS584-11 available in the previous study 11. The remaining SAG was excluded because of low data coverage (less than 5 markers present), and in one case (SAG33), because it was heavily contaminated with sequences from cryptophytes (see "Data availability" for genetic Tree). In total, our systematic genome data set contains 794 taxa and 317 protein-coding genes, and orthologs from Picozoa are included in 279 genes (88%) (Figure 1b). Compared with the previously available Picozoa genome data, this represents an increase in gene coverage from 18% to 88%. The most complete assembly is COSAG01, from which we identified 163 (51%) labeled orthologs.

A tandem protein alignment of selected 317 genes was used to infer the phylogenetic position of Picozoa in the eukaryotic tree of life. Initially, using the site homogeneity model LG FG and the ultra-fast bootloader with 1000 repetitions supported the reconstruction of the maximum likelihood (ML) tree from the complete 794 taxa dataset (Supplementary Figure 3). The analysis put Picozoa and the clade containing red algae and red algae together, and received strong support (100% UFBoot2), but due to the hidden molecules placed inside, the single line of Archaeplastida did not recover. In order to further study the location of Picozoa, we applied a more suitable site heterogeneous model to a reduced dataset of 67 taxa, because these models are more computationally demanding. The process of taxa reduction is driven by the requirement to maintain the representativeness of all major groups, while focusing sampling on the part of the tree to which Picozoa is most likely to belong, namely Archaeplastida, TSAR, Haptista, and Cryptista. We also merged several closely related lineages into OTU based on the initial ML tree to reduce missing data (Supplementary Data 2). The 67 taxon data sets are used for ML and Bayesian analysis, respectively. Among them, the most suitable site heterogeneous models are LG C60 FG PMSF (with non-parametric guidance) and CAT GTR G, respectively. Both ML and Bayesian analysis produced highly similar trees and gained the greatest support for most relationships, including deep divergence (Figure 2). The most interesting thing is that both analyses restored Archaeplastida's monophyletic (BS = 93%; PP = 1), with cryptologists acting as sister pedigrees (BS = 100%; PP = 1). Consistent with the original ML tree (Supplementary Figure 3), red algae and red algae branched together (BS = 95%; PP = 1), and Picozoa, as their sister, is fully supported (BS = 100%; PP = 1). This grouping is robust to rapidly evolving site removal analysis (Supplementary Figure 4), pruning 25% and 50% of the most biased sites (Supplementary Figure 5), and is also restored in the super tree method (ASTRAL-III) The multi-species coalescence model is used consistently (Supplementary Figure 6). Although this group is robust, we trimmed 50% of the most heterogeneous sites (Supplementary Figure 7) and after removing genes with fewer than two pico animal sequences (Supplementary Figure 8). In these analyses, Picozoa and red algae are most closely related, although this relationship has never been significantly supported. Approximately unbiased (AU) testing rejected all tested topologies, except for the two cases where Picozoa branched into the closest sister of red algae (p = 0.237) and the topology of Figure 2 (p = 0.822; Supplementary Table 2). Finally, we identified two amino acid substitution features in the eukaryotic translation elongation factor 2 protein (SA instead of ancestral GS residues, see Supplementary Data 3) in Picozoa and rhodelphids. This feature was previously shown to be able to combine red algae and green algae (And land plants), contact plants and some cryptologists18. The presence of SA in Picozoa supports their association with red algae and red algae.

The tree is based on a tandem alignment of 317 marker genes and reconstructed using a locus heterogeneous model LG C60 F G-PMSF. The support value corresponds to 100 non-parametric guided copy/posterior probability values ​​estimated using PhyloBayes CAT-GTR G. The black circle indicates full support (=100/1.0). The insertion shows the only other topology that was not rejected in the AU topology test, which was also restored when the aligned 50% of the most heterogeneous sites were trimmed.

Due to conflicting conclusions regarding the appearance of plastids in picozoans, we searched our genome data extensively to find evidence of mysterious plastids. First, we searched for plastid contigs in the SAG and CO-SAG assemblies as evidence of the plastid genome. Although some contigs initially showed similarity to the reference plastid genome, after careful inspection, these were considered to be bacterial (non-cyanobacterial) contamination and were rejected. In contrast, it is easy to identify mitochondrial contigs in 26 of 43 SAGs (Supplementary Data 4). Although mitochondrial contigs are still fragmented in most SAGs, four complete or nearly complete mitochondrial genomes have been recovered, and their coding content is almost the same as the published mitochondrial genome from picozoa MS5584-1119 (Supplementary Figure 9). The ability to assemble a complete mitochondrial genome from SAG shows that the partial nature of the data does not particularly hinder the restoration of organelle genomes (if any), at least in the case of mitochondria.

Secondly, we investigated the possibility that the plastid genome is lost while the organelles themselves are preserved—just like in the case of Rhodelphis5. To this end, we reconstructed several basic nuclear-encoded biochemical plastid pathways derived from endosymbiotic gene transfer (EGT) phylogenetic trees, these pathways are at least partially retained even in cryptic plastids5,21,22. These include participating isoprenoids (ispD, E, F, G, H, dxr, dxs), fatty acids (fabD, F, G, H, I, Z, ACC), heme (hemB, D, E, F, H, Y, ALAS) and iron-sulfur clusters (sufB, C, D, E, S, NifU, iscA; see also Supplementary Data S5). In all cases, picozoan homologues are grouped either with bacteria (rather than with cyanobacteria, indicating contamination) or with mitochondrial/nuclear copies of host origin. In addition, none of the pico animal homologues contain the predicted N-terminal plastid transit peptide. We also searched for picozoan homologs of all other proteins (n ​​= 62), which are predicted to target the mysterious plastids in rhodelphids5. The search resulted in a protein (Arogenate dehydrogenase, OG0000831) that has a picozoan homolog that is closely related to red algae, belonging to a larger clade with host-derived plastid targeting plant sequences, but picozoa and red algae None of the sequences showed the predicted transit peptide. Finally, in order to eliminate the possibility of sequence loss due to errors in the assembly and gene prediction process, we also searched for the same plastid targeting gene or plastid transport mechanism gene in the original read sequence, and no obvious candidate genes were found. . In contrast, we can easily identify mitochondrial genes (such as homologues of the mitochondrial import mechanism from the TIM17/TIM22 family), which further strengthens our inference that single-cell data is in principle sufficient to identify organelle components when they When it exists.

If the plastids are lost in the early stages of colony evolution, the lack of mysterious plastids in various modern ultra-small animals does not rule out photosynthetic ancestors. To assess this possibility, we searched more extensively for evidence of cyanobacteria footprints on the nuclear genome that would be higher than the background of horizontal gene transfer of proteins that function in cell compartments other than plastids. The existence of a large number of such proteins may be evidence of ancestors carrying plastids. We clustered together proteins from 419 genomes, including all major eukaryotic groups and selected bacteria into the orthologous group (OG) (Supplementary Data 6). We have established a phylogeny for OG, which contains at least cyanobacteria and algae sequences, as well as sequences from one of 33 focal taxa, including Picozoa, a series of photosynthetic taxa, but also includes non-photosynthetic plastids and plastid-deficient categories The group is used as a control. Putative gene transfer from cyanobacteria (EGT) was identified as a group of eukaryotic organisms carrying plastids, which included sequences from focal taxa and clade sisters to cyanobacteria clades. We allow up to 10% of the sequences to come from groups without plastid ancestry. This method identified 16 putative Picozoa EGTs, of which at least 2 different SAG/CO-SAG combinations were combined. In contrast, the EGT of photosynthetic species ranged from 89 to 313, while those with non-photosynthetic plastids There are as many as 59 EGTs for species (Figure 3a). At the other end of the spectrum of species with non-photosynthetic plastids, we observe the inferred number of cyanobacteria genes, such as red algae (14) or paramonas (12) and dermatophytes (16) or other classifications lacking plastids Groups are comparable to Telonema (15) or Goniomonas (18). In order to distinguish these postulated endosymbiotic transfers from the background of bacterial transfer (or bacterial contamination), we next tried to estimate the expanded bacterial signal (hypothetical HGT: indication of horizontal gene transfer) by using the same tree classification program. Normalized EGT signal (Supplementary Figure 10). When comparing the inferred EGT number with the inferred HGT number, we found a significant difference between plastid-containing (including non-photosynthetic) and plastid-deficient lineages. Although all taxa containing plastids—except Rhodelphis—show a ratio of EGT to HGT higher than 1, but all species without plastids and Hematodinium (one of the few taxa that report loss of plastids) and The number of HGT inferred by Rhodelphis and Picozoa is much higher than that of EGT.

a Inferred endosymbiotic gene transfer (EGT) numbers in 33 selected species, these species represent photosynthetic plastids (green), non-photosynthetic plastids (blue), confirmed plastid loss (yellow) and no The group of knowledge plastid ancestry (black). These species can be compared with Picozoa (orange). b The number of EGTs from (a) is related to the inferred number of HGTs in the same 33 selected species. A number less than 1 indicates more HGT than EGT, and a number greater than 1 indicates more EGT than HGT. The Arabidopsis ratio cannot be calculated because there are no detectable HGT events. Supplementary Table 3 provides source data.

The 17 SAGs and CO-SAGs of Picozoa obtained in this study provide reliable data for the phylogenetic analysis of this important eukaryotic phylum. With these data, we can firmly place Picozoa in the Archaeplastida supergroup, which is likely to be the sister lineage of red algae and red algae. Ancient plastids contain all known lineages with primary plastids (except Paulinella), which are widely believed to originate from a single primary endosymbiosis with cyanobacteria. Cell and genomic data (see references 23, 24 and references therein for review) and plastid phylogeny25, 26 support this concept of the common origin of primary plastids. The phylogenetic support of Archaeplastida is uncertain based on host (nuclear) data7,8,27, but our analysis is consistent with recent reports, and these reports are also restored when using gene and taxa-rich phylogenetic data sets Single-line origin-Picozoa, 9, 10 are included here. This position is important for our understanding of the origin of plastids, because compared with all other ancient plastids known so far, our results indicate that the Plastidia lacks plastids and plastid-related EGT. Based on the small initial SAG data11 and the ultrastructural observations of P. judraskeda14, it is also inferred that Picozoa lacks plastids. There are two main possible hypotheses to explain the lack of plastids in Picozoa: the population never photosynthesized, or a complete loss of plastids occurred early in its evolution.

To show that Picozoa has never undergone photosynthesis, it is necessary to attribute the current distribution of primary plastids to multiple independent endosymbiosis, especially red algae (and possibly red algae) from one or two independent primary endosymbiosis, and these Endosymbiosis leads to green algae and cyanobacteria. This situation will involve the endosymbiosis of closely related cyanobacteria lineages in closely related hosts to explain the many similarities between primary plastids. Although this sounds unlikely, there is increasing evidence that in dinoflagellates with tertiary plastids, similar plastids are independent of similar endosymbiosis 28, 29, 30, and have been The primary plastids have been argued 31,32,33,34. However, a large number of cellular and molecular evidences show that multiple independent origins of primary plastids are unlikely, including several features of plastid biology that are not present in cyanobacteria (for example, protein targeting systems, light-harvesting complex proteins or (Plastid genome structure) 23, 24, 35. A related explanation may involve secondary endosymbiosis, for example, the plastids in red algae are obtained twice from green algae. By identifying host-derived plastid components shared between all ancient plastid lineages, the latter scenario is unlikely to occur.

The second hypothesis implies that a common ancestor of Picozoa has completely lost its primary plastids. In a free-living lineage like Picozoa, the possibility of plastid loss is unprecedented, because so far, the only known clear cases of complete plastid loss are from the parasitic lineage (all in Mythimna separata: in Cryptosporidium 37. Certain gregarines 22, 38 and Hematodinium 39). To assess this possibility, we searched our data for cyanobacteria footprints in the nuclear genome caused by ancestral symbiosis. The transfer of genes from endosymbionts to the host cell nucleus through EGT, and the targeting of some or all of the products of these genes back to the plastid, are considered to be signs of organelle integration40,41. EGT occurs in all algae, although its impact on the nuclear genome may be different, and the inference of EGT and other levels of acquired genes (HGT) may be difficult to decipher for ancient endosymbiosis 42,43,44,45,46 . Our analysis of the normalized cyanobacteria signal in Picozoa, which we used as a proxy to quantify EGT, did not provide clear evidence of the existence of ancestors with plastids. However, it should be noted that assessing the possibility of plastid loss in populations with unproven photosynthetic ancestry (such as dermatologists) is complicated because there is no baseline for the endosymbiotic survival footprint after plastid loss. It is worth noting that we found that the number of inferred EGTs in Picozoa and the lineages with proven mass loss (for example, Hematodinium with 10 inferred EGTs), the lineages with non-photosynthetic plastids (for example, Rhodelphis: 14 inferred EGTs) or There is no significant difference compared to the lineage without photosynthesis. Lineage (eg Telonema: 15 inferred EGT).

Due to the limitations of our data and methods, the lack of a genomic baseline for evaluating Picozoa plastid loss further complicates. Part of the nature of eukaryotic SAG makes it possible that EGT does not exist in our data, even if the inferred genome integrity exceeds 90%. In addition, even if plastids once existed, the number of EGTs may have been low during the evolution of this group. Recently, the endogenous symbiosis that can accurately locate EGT shows that the frequency is relatively low. For example, they account for only a few percent of the pigment cell proteome in Paulinella, or only 9 genes in the tertiary endosymbiosis in dinoflagellate48. Therefore, it may be inferred that higher numbers of EGT in red algae (such as 168 in Galdieria) occurred after Picozoa's divergence, and Picozoa quickly lost its plastids before more EGT occurred. One observation supporting this hypothesis is that the number of putative EGTs found in Rhodelphis is small (14), which suggests that most of the endosymbiotic transfer in red algae may have occurred after they were separated from the red algae.

In this study, we used single-cell genomics to prove that Picozoa is a plastid lacking the main lineage of paleoplasts. As far as we know, this is the first example of an ancient plastid lineage without plastids, which can be explained as a loss of plastids, or evidence of independent endosymbiosis of red algae and red algae ancestors. In the case of the most widely accepted primitive plastid single plastid origin, Picozoa will represent the first known case of plastid loss in this group, but it is also more common in any free-living species. In order to distinguish between the loss of plastids and the acquisition of multiple plastids in the early primitive plastid evolution, it is more common to understand the early steps of plastid integration in the evolution of secondary or tertiary plastids. In the recently evolved Paulinella primary plastid-like pigment cells, the transfer of endosymbiotic genes at the beginning of integration proved to be minimal4. Similar examples of integrative plastid symbionts are known in dinoflagellate but apparently there are very few EGT 48,49. Therefore, new important clues to decipher the origin of plastids may come from a better understanding of the host’s role in driving these endosymbiosis, and it is crucial to search and characterize the new diversity of primitive plastids to establish a more complete ancient history. Plastid evolution framework. There is no lineage of plastids. The fact that this pedigree has never been successfully maintained in culture, only one study achieved a short-lived culture14, which may indicate that a way of life involves close connections with other organisms (such as symbiosis), and further emphasizes Pico The mystery of animal biology lacks an explanation for what hindered our evolution.

From the Baltic Sea Linnaeus Microbial Observatory (LMO) located in the Baltic Sea at 56°N 55.85ʹ and 17°E 03.64ʹ, two collections of surface (depth: up to 2 m) seawater: May 2, 2018 (6.1°C and 6.8 ppt salt) Degree) and April 3, 2018 (2.4°C and 6.7 ppt salinity). The samples are transported to the laboratory and filtered and graded. Fractions larger than 2 µm are discarded, and the fraction collected on a 0.2 µm filter is resuspended in 2 mL of filtrate. The obtained sample is used for fluorescence activated cell sorting (FACS). Add 4 µL of 1 mM Mitotracker Green FM (ThermoFisher) stock solution to the sample and store at 15°C in the dark for 15-20 minutes. Then use the MoFlo Astrios EQ cell sorter (Beckman Coulter) to sort the cells into empty 96-well plates. The setting of the gate is mainly based on the Mitotracker intensity. The dye is detected by 488 nm and 640 nm lasers for excitation, 100 µm nozzle, 25 psi sheath pressure, and 0.1 µm sterile filtered 1× PBS as the sheath fluid. The area with the highest green fluorescence and forward scatter contains the target group, which is then used with exclusion of red autofluorescence (Summit v 6.3.1).

According to the manufacturer’s recommendations, use the REPLI-g® Single Cell Kit (Qiagen) to generate SAG in each well, but scale it down to a 5 µL reaction. Since the cells are sorted in a dry plate, add 400 nL 1×PBS before 300 nL Lysis Buffer D2, 65°C for 10 minutes, place on ice for 10 minutes, and then add 300 nL stop solution. PBS, reagent D2, stop solution, water and reagent tube are treated with 2 joules of ultraviolet light before use. Add SYTO 13 (Invitrogen) at a final concentration of 0.5 µM to the MDA master mix. The reaction was run at 30°C for 6 hours, then inactivated at 65°C for 5 minutes, and monitored by detecting SYTO13 fluorescence every 15 minutes using a FLUOstar® Omega plate reader (BMG Labtech, Germany). Single amplified genomic (SAG) DNA is stored at -20°C until further PCR screening. As described in the reference, the obtained products were screened by PCR using the Pico-PCR method (primers PICOBI01F, 5'-CGGATTTTGGCATCACGC-3' and P01ITS1R, 5'-CATCTCAATGTTCACGTGG-3'). Figure 14 and the wells showing the Picozoa signal were selected for sequencing.

During three separate voyages in the eastern North Pacific, seawater was collected and sorted using the BD InFlux fluorescence activated cell sorter (FACS). The instrument is equipped with a 100 mW 488 nm laser and a 100 mW 355 nm laser, and runs with sterile nuclease-free 1× PBS as the sheath fluid. The sorting station is located at 36.748°N, 122.013°W (M1 station; 20 m on April 2, 2014 and 10 m on May 5, 2014); 36.695°N, 122.357°W (M2 station, 10 m) m, May 5, 2014); and 36.126°N, 123.49°W (Station 67–70, 20 m, October 15, 2013). Use the Niskin bottle mounted on the CTD wreath to collect the water. Before sorting, the sample was concentrated onto a 0.8 μm Supor filter by gravity. Two different stains were used: LysoSensor (April 2, 2014, M1) and LysoTracker (May 5, 2014, M1; October 15, 2013, 67–70), or both (2014, May 5, M2). Eukaryotic cells stained with LysoTracker Green DND-26 (Life Technologies; final concentration, 25 nM) were selected based on scattering parameters, positive green fluorescence (520/35 nm bandpass), compared with unstained samples, and excluded known Phytoplankton populations are distinguished by forward-angle light scattering and red (chlorophyll-derived) autofluorescence (ie, 692/40 nm bandpass) under excitation at 488 nm, similar to the method in the reference. 50. Similarly, select cells stained with LysoSensor Blue DND-167 (Life Technologies; final concentration, 1 μM), which is a ratio probe sensitive to intracellular pH levels, such as in lysosomes, based on scattering parameters , Positive blue fluorescence (435/40 nm bandpass), compared with unstained samples, and excludes known phytoplankton populations because of their forward angle light scattering and red (chlorophyll-derived) autofluorescence (ie 692 /40 nm bandpass filter) to distinguish under 355 nm excitation. For the classification using two stains, all the above criteria and the excitation of the two lasers (collecting emission through different pinholes and filter sets) are applied to select cells. Before the start of each classification, the respective panels were irradiated with ultraviolet light for 2 minutes. Use the single cell sorting mode from BD FACS software v1.0.0.650 to sort cells into 96-well or 384-well plates. Part of the wells were left blank or received 20 cells respectively as negative and positive controls. After sorting, the plates are covered with sterile, nuclease-free foil and frozen at -80°C immediately after completion.

The whole genome amplification of a single sorted cell follows the method outlined in the reference. 50. For the initial screening, use Illumina adapted TAReuk454FWD1 (5'-CCAGCASCYGCGGTAATTCC-3') and TAReukREV3 (5'-ACTTTCGTTCTTGATYRA-3') primers for the V4 hypervariable region to amplify the 18S rRNA gene amplification from each well. Zengzi. The PCR reaction contained 10 ng of template DNA and 1×5 PRIME HotMasterMix (Quanta Biosciences) and 0.4 mg mL-1 BSA (NEB) and 0.4 μM of each primer. The PCR reaction requires: 94°C for 3 minutes; 94°C for 45 s, 50°C for 60 s, 72°C for 90 s, 30 cycles; and finally at 72°C for 10 minutes. Combine the three replicate reactions of each cell before paired-end (PE) library sequencing (2 × 300 bp), and trim the resulting 18S V4 rRNA gene amplicon with Sickle 1.33's 10 bp run window with a Phred quality (Q) of 25 (Https://github.com/najoshi/sickle). When reads have ≥ 40 bp overlap and a maximum of 5% mismatch, use USEARCH v.9.0.2132 to merge double-ended reads. Filter the combined readings to delete readings with a maximum error rate of >0.001 or <200 bp in length. Keep the sequence that exactly matches the two primers, use Cutadapt v.1.1351 to trim the primer sequence, and use UCLUST to form an operational taxon (OTU) for the rest of the sequence to cluster de novo with 99% sequence similarity. Each cell sequenced further has an abundant OTU, which is classified and identified using BLASTn in the nr database of GenBank.

Use the TruSeq Nano DNA Sample Preparation Kit (Cat. No. 20015964/5, Illumina Inc.) to prepare a sequencing library from 100 ng DNA, with an insert size of 350 bp. For six samples, less than 100 ng (between 87 and 97 ng) was used. The library preparation was performed by the SNP&SEQ technology platform of Uppsala University according to the manufacturer's instructions. Then, all samples were subjected to 150 cycles of paired-end sequencing using v2.5 sequencing chemistry on one lane of the Illumina HiSeqX instrument, resulting in 10,000 to 30,000,000 read pairs.

Use Trim Galore v0.6.1 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) to trim 43 Illumina datasets with default parameters, and use SPAdes v3.13.052 to assemble into genomes in single cell mode Contigs (--sc --careful -k 21,33,55,99). Use Prodigal v2.6.3 to identify and translate open reading frames (ORF) in "anonymous" mode, and use barrnap v0.9 (https://github.com/tseemann/barrnap) to predict eukaryotic rRNA genes. All 18S rRNA gene sequences, together with available reference sequences from the Protist Ribosome Reference Database (PR2, https://pr2-database.org/), aligned with MAFFT E-INS-i v7.42954 and used Trimal55 (gap threshold 0.01 %). After performing model testing with ModelFinder56 (best model: GTR R6 F), the phylogenetic tree was reconstructed in IQ-TREE v2.1.157, which contains 1000 ultra-fast boot copies (see Supplementary Figure 11 for extended taxon sampling Tree). In addition, we estimated the average nucleotide identity (ANI) of all SAG pairs using fastANI v1.258 (Supplementary Figure 12). According to the 18S rRNA gene tree and the ANI value, it was determined that the closely related SAG groups with almost the same 18S rRNA gene sequence (sequence similarity above 99%) were co-assembled. Co-modules are generated in the same way as the individual modules described above, and sequencing libraries from closely related single cells are pooled. ORF and rRNA genes are similarly extracted from co-assembly. Then use BUSCO v4.1.315 with 255 eukaryotic biomarkers (Supplementary Figure 13) and use the 320 marker phylogenetic data set described below to assess the integrity of SAG and CO-SAG. Use QUAST v5.0.259 to calculate general genome characteristics. In the same way as above, the PR2 reference reconstruction of the 18S rRNA gene from the co-assembly and the SAG not included in any CO-SAG, as well as the crypts and katablepharids (the group closest to Picozoa in the 18S rRNA gene phylogeny) Comparison. After model selection, use GTR R4 F to reconstruct the tree, and use 100 non-parametric bootstrap evaluation support. Six CO-SAGs and 11 individual SAGs are used for all subsequent analyses.

For each of these 17 components, we estimated the number of prokaryotic/viral contamination by comparing the predicted protein with the NCBI nr database using DIAMOND in blastp mode60. If at least 60% of all the proteins from contig produce significant hits only to sequences annotated as prokaryotic or viral, we consider the contig to be a hypothetical contamination. Generally speaking, only a small part of each component is found to be such contamination (Supplementary Figure 14).

An existing unpruned alignment of 320 genes and 763 taxa from references. 1 Used to create an HMM profile in HMMER v3.2.161, and then used to identify homologous sequences in protein sequences predicted from Picozoa components (or co-components) and another 20 recently sequenced eukaryotic genomes and transcriptomes (Supplementary Table 1). Each single-gene data set was filtered using PREQUAL v1.0262 to remove non-homologous residues before comparison, MAFFT E-INS-i was used for comparison, and Divvier-partial v1.063 was used for filtering. Then use the comparison and IQ-TREE (-mset LG, LG4X; 1000 ultra-fast boot and BNNI optimization) to reconstruct the gene tree. All trees are manually inspected to identify pollution and collateral plants. These steps are repeated at least twice until no further contamination or paralogs can be detected. We excluded three genes, which show fuzzy groupings of dermatome or red caterpillars in different parts of the tree. From this complete data set containing 317 genes and 794 taxa, we used the cleaned up alignment described above to create a connected supermatrix alignment. The super matrix is ​​used to reconstruct a tree in IQ-TREE, the model is LG GF, and the BNNI improvement is used for ultra-fast boot (1000 UFBoots) estimation.

We then prepared a simplified data set containing a more concentrated taxa sampling of 67 taxa, covering all major eukaryotic lineages, but focusing on groups that have previously been reported to be associated with dermatologists. For this data set, in some cases, closely related species are merged into OTU to reduce the amount of missing data per taxa (Supplementary Data 2). 317 single-gene data sets were realigned using MAFFT E-INS-i, filtered using Divvier -partial and BMGE (-g 0.2 -b 10 -m BLOSUM75, v1.12) and connected into two hypermatrices. The model selection of the mixed model was performed on two data sets using ModelFinder56. In both cases, LG C60 GF was selected as the best fit model. The post-average station frequency (PMSF)64 of the hybrid model in IQ-TREE was used to approximately reconstruct the trees of the two data sets, and 100 non-parametric bootstrap evaluation support was used (see Supplementary Figure 15 of the Divvier Derived Tree).

In addition, we used the CAT GTR G model in PhyloBayes MPI v1.865, and reconstructed the phylogenetic tree using the BMGE-based pruning supermatrix comparison. We ran three independent chains for 3,600 cycles, and the first 1,500 cycles were removed from each chain as aging. Then we use PhyloBayes' bpcomp program to generate a consensus tree. Partial convergence is achieved between chain 1 and chain 2, and the maxdiff value is 0.26 (Supplementary Figure 16). The third chain differs only in the position of the tentacles and Ancoracysta twista, while the relationship within Archaeplastida and Picozoa is not different (Supplementary Figure 17).

To test the robustness of our results, we also performed a rapid site deletion analysis66, and iteratively deleted 5000 fastest-growing sites (a total of 55,000 sites were deleted). For each of these 11 comparisons, we used the model LG C60 GF in IQ-TREE and the ultra-fast bootloader (1000 UFBoots) to reconstruct an ML tree, and evaluated the comparison of Picozoa with red algae and red algae. Branches are supported for other groupings (Supplementary Figure 4). We also pruned 25% and 50% of most heterogeneous sites based on the χ2 metric, and performed tree reconstruction using the same model as above (Supplementary Figures 5 and 7). We also prepared a super matrix comparison (BMGE pruning) from 224 genes in the final data set, and performed a similar tree reconstruction in the final data set (the model LG C60 GF in IQ-TREE, with 1000 super fast Bootstrap, supplementary Figure 8).

In addition, we used ASTRAL-III v5.7.368 to perform a phylogenetic reconstruction based on the super tree. We use IQ-TREE (-m TEST -mset LG -mrate G, R4 -madd LG4X, LG4X F, LG4M, LG4M F, and use 1000 ultra-fast bootloaders for 317 comparisons of 67 taxa datasets. Each reconstruction gene tree) and multi-site guidance based on guided replication (option-b in ASTRAL-III) (Supplementary Figure 6).

Finally, we conducted an approximate unbiased (AU) test in 15 topological structures of IQ-TREE (see Supplementary Table 2), including previously restored Picozoa positions (as red algae, crypts, telomeres, paleoplasts) Sisters).

Using the published dermatological animal mitochondrial genome (dermatological animal MS584-11: MG202007.1 in reference 19), perform a BLAST search on the dedicated sequence server 69 to identify the mitochondrial contigs in 43 dermatological animal SAGs . Use the MFannot server (https://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl) to annotate the hypothetical mitochondrial contig. All contigs with predicted mitochondrial genes or their highest hits in the NCBI nr database are published pico animal mitochondrial genomes (MG202007.1, https://www.ncbi.nlm.nih.gov/nuccore/MG202007.1 /) is considered to be a true mitochondrial contig and retained (Supplementary material). Manually annotate as needed.

GetOrganelle v1.7.170 is used to identify the organelle genome. We used the subcommand "get_organelle_from_assembly.py -F embplant_pt,other_pt" to search for hypothetical plastid contigs, and we tried to use the command "get_organelle_from_reads.py -R 30 -k 21,45,65,85,105" to directly assemble such a genome -F embplant_pt,other_pt'. We also used DIAMOND v2.0.660 in blastp mode (--more sensitive) to search for the predicted protein against the available plastid protein sequence from ncbi. Then it was determined to be a contig from the plastid genome by manually checking the BLAST search on NT, and only showed contigs similar to the bacterial genome or the mitochondrial assembly of micro-animals MG202007.1 (https://www.ncbi.nlm). nih.gov/nuccore/MG202007.1/) was rejected.

In order to find the known plastid pathways, we prepared Hidden Markov Model (HMM) profiles for 32 gene alignments. These alignments have been shown to remain in the lineages with non-photosynthetic plastids and follow the references. A similar approach in this includes a wide variety of eukaryotes that carry plastids. 22. Using these profiles, we identified the homologues in Picozoa SAG and used them to create a profile using MAFFT E-INS-i The initial sequence of the file is aligned. We use trimAl v1.4.rev15'-gt 0.05' to trim alignments, and use IQ-TREE (-m LG4X; 1000 ultra-fast boot and BNNI optimization) to reconstruct the phylogenetic tree from these alignments. We then manually inspected the trees to assess whether the dermatological sequence was grouped with the known plastid pedigree. We also used the sequences from these core plastid genes to search the raw sequencing reads for any signs of homologues that might have been missed in the assembly. We use the PhyloMagnet v0.771 tool to recruit, read and perform gene-centric assembly of these genes72. Then use DIAMOND in blastp mode to compare the assembled genes with the NR database (--more-sensitive --top 10).

To identify the putative EGT, we used OrthoFinder v2.4.073 to prepare orthologous clusters for 419 species (128 bacteria and 291 eukaryotes), focusing on eukaryotes and cyanobacteria with plastids, but also Other eukaryotes and bacteria. For Picozoa and the selection of 32 photosynthetic or heterotrophic lineages (Supplementary Table 3), we inferred 2626 clusters of trees, which contain the considered species, at least one cyanobacteria sequence, and at least one red algae, green algae, or plant The original plastid sequence of these clusters. The alignment of these clusters was generated using MAFFT E-INS-i, filtered using trimAl'-gt 0.01', and using IQ-TREE (-m LG4X; 1000 ultra-fast bootloaders and BNNI optimization ) Rebuild the phylogenetic tree. Then, we identified the target species and other trees with plastid lineages (allowing up to 10% aplasmid sequences) and at least two sister trees of cyanobacteria sequences. For Picozoa, we added the condition that the sequences from at least two SAG/COSAG components must be single-line. For species with no known plastid ancestors, such as Rattus or Phytophthora, the assumed EGT can be interpreted as a false positive due to pollution, poor tree resolution, or other mechanisms, because we expect that EGT from cyanobacteria does not exist in these species. A rough estimate of the expected false positive rate of this method can provide us with a false positive baseline, which is also expected for picozoa.

In order to correlate the putative number of EGTs with the total amount of gene transfer, we applied a method very similar to the above method to detect putative HGT events. We prepared additional trees for clusters of taxa of interest and non-cyanobacterial bacteria (in the same way as for detecting EGT), and identified clades of taxa under consideration (including larger taxa, such as pseudo-cyanobacteria). The Streptophyta of Arabidopsis or the metazoan of Rattus) branch the sisters into bacterial clades.

We screened the available OTUs obtained from the eukaryotic amplicon data of the V9 18S rRNA gene generated by Tara Oceans17 to obtain sequences related to Picozoa. Using the V9 region from the 18S rRNA gene sequence of 17 Picozoa components and the picozoan PR2 reference used to reconstruct the 18S rRNA gene tree above, we applied VSEARCH v2.15.174 (--usearch_global -iddef 1 --id 0.90) to find All OTUs in the V9 region that are at least 90% similar to any of these reference dermatome sequences. Using the relative abundance information available at each Tara Oceans sampling location, we then calculated the sum of all identified Picozoa OTUs at each site and plotted the relative abundance on the world map.

For more information on the research design, please see the abstract of the nature research report linked to this article.

All data used for analysis and result files (such as contigs and single gene trees) can be found on figshare (https://doi.org/10.6084/m9.figshare.c.5388176). A sequenceServer BLAST server is set up for the SAG assembly: http://evocellbio.com/SAGdb/burki/. The original sequencing reads are deposited in NCBI's Sequence Read Archive (SRA) under the accession number of PRJNA747736. This article provides source data.

All custom scripts used in this study are available under the MIT license at https://github.com/maxemil/picozoa-scripts (https://doi.org/10.5281/zenodo.5561108).

Strassert, JFH, Irisarri, I., Williams, TA & Burki, F. The molecular time scale of eukaryotic evolution has an impact on the origin of red algae-derived plastids. Nat. Community. 12, 1879 (2021).

ADS CAS PubMed PubMed Central Google Scholar 

Burki, F., Roger, AJ, Brown, MW & Simpson, AGB A new tree of eukaryotes. Trend ecology. evolution. 35, 43–55 (2019).

Marin, B., Nowack, ECM & Melkonian, M. Plastids being produced: Evidence of the second major endosymbiosis. Protist 156, 425–432 (2005).

Nowack, ECM and Weber, APM Genomics' insights into the evolution of symbiotic organelles in photosynthetic eukaryotes. Anu. Pastor plant biology. 69, 1-34 (2018).

Gawryluk, RMR, etc. Non-photosynthetic predators are sisters of red algae. Nature 572, 240–243 (2019).

Li, L. etc. The genome of Pine Bark reveals the existence of the third phylum in green plants. Nat. Ecology. evolution. 4, 1220–1231 (2020).

PubMed PubMed Central Google Scholar 

Burki, F. et al. Unraveling the early diversification of eukaryotes: a phylogenetic study of the evolutionary origins of Centrohelida, Haptophyta, and Cryptista. Process R. Soc. B biological. science. 283, 20152802 (2016).

Strassert, JFH, Jamy, M., Mylnikov, AP, Tikhonenkov, DV & Burki, F. The new phylogenetic analysis of the mysterious portal telomeremia further analyzes the eukaryotic tree of life. Mole. biology. evolution. 36, 757–765 (2019).

CAS PubMed PubMed Central Google Scholar 

Lax, G. etc. Hemimastigophora is a new type of transworld level eukaryotic lineage. Nature 564, 410–414 (2018).

ADS CAS PubMed Google Scholar 

Irisarri, I., Strassert, JFH & Burki, F. Systematic genomics insights into the origin of primary plastids. system. biology. https://doi.org/10.1093/sysbio/syab036 (2021).

Yin, HS etc. Single-cell genomics reveals the interaction of organisms in uncultured marine protists. Science 332, 714–717 (2011).

ADS CAS PubMed Google Scholar 

No, F. et al. Picobiliphytes: a marine microplankton algae whose relationship with other eukaryotes is unknown. Science 315, 253–255 (2007).

ADS CAS PubMed Google Scholar 

Cuvelier, ML, etc. Wide distribution of a unique lineage of marine protists. environment. microorganism. 10, 1621–1634 (2008).

CAS PubMed PubMed Central Google Scholar 

Seenivasan, R., Sausen, N., Medlin, LK & Melkonian, M. Picomonas judraskeda Gen. Et Sp. November: The first confirmed member of Picozoa Phylum Nov., a widespread dermatophyte , Formerly known as "cortex plant". PLoS ONE 8, e59565 (2013).

ADS CAS PubMed PubMed Central Google Scholar 

Simão, FA, Waterhouse, RM, Ioannidis, P., Kriventseva, EV & Zdobnov, EM BUSCO: Use single-copy orthologs to assess genome assembly and annotation integrity. Bioinformatics 31, 3210–3212 (2015).

The rise and fall of Moreira, D. and López-García, P. Picobiliphytes: How hypothetical autotrophs became heterotrophs. Biology Paper 36, 468–474 (2014).

PubMed PubMed Central Google Scholar 

Vargas, Cde, etc. The diversity of eukaryotic plankton in the ocean under sunlight. Science 348, 1261605 (2015).

Kim, E. & Graham, LE EEF2 analysis challenges the unity of archaea and chromatin vesicles. PLoS ONE 3, e2621 (2008).

ADS PubMed PubMed Central Google Scholar 

Janouškovec, J. et al. The new lineage of eukaryotes illustrates the reduction of the early mitochondrial genome. Curry creatures. 27, 3717–3724.e5 (2017).

Widman, JG et al. Targeted single-cell genomics of heterotrophic flagellum protists reveals unexpected mitochondrial genome diversity. Nat. microorganism. 5, 154–165 (2020).

Dorell, RG, etc. The evolutionary principle of plastid reduction elucidated by non-light alloy algae plants. Process National Academy of Sciences. science. America. https://doi.org/10.1073/pnas.1819976116 (2019).

Mathur, V. etc. Multiple independent origins of Apicomplexan-like parasites. Curry creatures. 29, 2936–2941.e5 (2019).

Reyes-Prieto, A., Weber, APM and Bhattacharya, D. The origin and establishment of plastids in algae and plants. Anu. Pastor Genette. 41, 147–168 (2007).

Gould, SB, Waller, RF & McFadden, GI Plastid evolution. Anu. Pastor plant biology. 59, 491–517 (2008).

Shih, PM etc. Use diversity-driven genome sequencing to improve the coverage of the Cyanophyta. Process National Academy of Sciences. science. United States 110, 1053–1058 (2013).

ADS CAS PubMed Google Scholar 

Ponce-Toledo, RI, etc. Early branching freshwater cyanobacteria at the origin of plastids. Curry creatures. 27, 386–391 (2017).

CAS PubMed PubMed Central Google Scholar 

Yabuki, A. etc. Palpitomonas bilix represents a basic mystic pedigree: insight into character evolution in Cryptista. science. Representative 4, 4641 (2014).

CAS PubMed PubMed Central Google Scholar 

Hehenberger, E., Gast, RJ & Keeling, PJ A kleptoplastidic The tipping point between the symbiosis of dinoflagellate and transient and fully integrated plastids. Process National Academy of Sciences. science. United States 116, 17934-17942 (2019).

CAS PubMed PubMed Central Google Scholar 

Sarai, C. etc. The dinoflagellate with residual endosymbiotic nuclei serves as a model to clarify the occurrence of organelles. Process National Academy of Sciences. science. United States 117, 5364–5375 (2020).

CAS PubMed PubMed Central Google Scholar 

Yamada, N., Sakai, H., Onuma, R., Kroth, PG & Horiguchi, T. Dinothrix are five non-motile dinoflagellates. front. Plant science. 11, 591050 (2020).

PubMed PubMed Central Google Scholar 

Stiller, JW, Reel, DC & Johnson, JC re-examine the single origin of plastids: the convergent evolution of organelle genome content. J. Phycol. 39, 95–105 (2003).

Larkum, AWD, Lockhart, PJ & Howe, CJ Shopping for plastids. Trends in plant science. 12, 189–195 (2007).

Howe, CJ, Barbrook, AC, Nisbet, RER, Lockhart, PJ & Larkum, AWD The origin of plastids. Philos. Translated by R. Soc. B biological. science. 363, 2675–2685 (2008).

Stiller, JW Toward An empirical framework for explaining the evolution of plastids. J. Phycol. 50, 462–471 (2014).

Bhattacharya, D., Archibald, JM, Weber, APM & Reyes-Prieto, A. How do endosymbionts become organelles? Understand the early events of plastid evolution. Biology Paper 29, 1239–1246 (2007).

Kim, E. & Maruyama, S. Thoughts on the secondary origin of green algae and plant plastids. Journal of Sociology. robot. Boll. 83, 331–336 (2014).

Zhu, G., Marchewka, MJ & Keithly, JS Cryptosporidium parvum seems to lack the plastid genome. Microbiology 146, 315–321 (2000).

Janouškovec, J. et al. Apicomplexan-like parasites are multi-lineage, extensively but selectively dependent on concealed plastid organelles. Elife 8, e49662 (2019).

PubMed PubMed Central Google Scholar 

Gornik, SG, etc. Eliminate endosymbiosis by gradually eliminating plastids in parasitic flagella. Process National Academy of Sciences. science. United States 112, 5767–5772 (2015).

ADS CAS PubMed PubMed Central Google Scholar 

Timmis, JN, Ayliffe, MA, Huang, CY & Martin, W. Endosymbiotic gene transfer: organelle genome forges eukaryotic chromosomes. Nat. Pastor Genette. 5, 123–135 (2004).

Archibald, JM's genomic perspective on the birth and spread of plastids. Process National Academy of Sciences. science. United States 112, 10147–10153 (2015).

ADS CAS PubMed PubMed Central Google Scholar 

Burki, F. et al. Reassess the green and red signals in eukaryotes with secondary plastids derived from red algae. Genomic biology. evolution. 4, 626–635 (2012).

Deschamps, P. & Moreira, D. Reassess the contribution of green to the diatom genome. Genomic biology. evolution. 4, 683–688 (2012).

CAS PubMed PubMed Central Google Scholar 

Qiu, H., Yoon, HS & Bhattacharya, D. Algae endosymbiosis as a vehicle for horizontal gene transfer in photosynthetic eukaryotes. front. Plant science. 4, 1-8 (2013).

Morozov, AA & Galachyants, YP Diatom genes derived from red algae and green algae: influence on the secondary endosymbiosis model. March Genome. 45, 72-78 (2019).

Sibbald, SJ and Archibald, JM's genomic insights into plastid evolution. Genomic biology. evolution. 12. Evaa096 (2020).

Singh, A. et al. A large amount of protein is imported into the photosynthetic organelles in the early evolution stage of Amoeba Borinella. Curry creatures. 27, 2763–2773.e5 (2017).

Burki, F. et al. Endosymbiotic gene transfer in flagella with tertiary plastids. Eukaryotes. Cell 13, 246–255 (2014).

PubMed PubMed Central Google Scholar 

Hehenberger, E., Burki, F., Kolisko, M. & Keeling, PJ The functional relationship between the dinoflagellate host and its diatom endosymbiosis. Mole. biology. evolution. 33, 2376–2390 (2016).

Needham, DM, etc. A unique giant virus lineage brings the rhodopsin photosystem to the single-celled marine predator. Process National Academy of Sciences. science. United States 116, 20574-20583 (2019).

ADS CAS PubMed PubMed Central Google Scholar 

Martin, M. Cutadapt removes adaptor sequences from high-throughput sequencing reads. Emnet J. 17, 10–12 (2011).

Bankevich, A. etc. SPAdes: A new genome assembly algorithm and its application in single-cell sequencing. J. Calculation. biology. 19, 455–477 (2012).

MathSciNet CAS PubMed PubMed Central Google Scholar 

Hyatt, D. etc. Prodigal: Prokaryotic gene recognition and translation start site recognition. Bmc biological information. 11, 119 (2010).

Katoh, K. & Standley, DM MAFFT Multiple Sequence Alignment Software Version 7: Performance and Usability Improvements. Mole. biology. evolution. 30, 772–780 (2013).

CAS PubMed PubMed Central Google Scholar 

Capella-Gutiérrez, S., Silla-Martínez, JM & Gabaldón, T. trimAl: A tool for automatic alignment and trimming in large-scale phylogenetic analysis. Bioinformatics 25, 1972–1973 (2009).

PubMed PubMed Central Google Scholar 

Kalyaanamoorthy, S., Minh, BQ, Wong, TKF, Haeseler, Avon & Jermiin, LS ModelFinder: Fast model selection for accurate phylogenetic estimation. Nat. Method 14, 587–589 (2017).

CAS PubMed PubMed Central Google Scholar 

Ming, BQ, etc. IQ-TREE 2: A new model and effective method for phylogenetic inference in the age of genome. Mole. biology. evolution. 37, 1530–1534 (2020).

CAS PubMed PubMed Central Google Scholar 

Jain, C., Rodriguez-R, LM, Phillippy, AM, Konstantinidis, KT & Aluru, S. High-throughput ANI analysis of the 90K prokaryotic genome revealed clear species boundaries. Nat. Community. 9, 5114 (2018).

ADS PubMed PubMed Central Google Scholar 

Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: Genome assembly quality assessment tool. Bioinformatics 29, 1072–1075 (2013).

CAS PubMed PubMed Central Google Scholar 

Buchfink, B., Xie, C. & Huson, DH uses DIAMOND for fast and sensitive protein comparisons. Nat. Method 12, 59–60 (2015).

Eddy, SR accelerates the HMM search of configuration files. PLoS calculation. biology. 7. e1002195 (2011).

ADS MathSciNet CAS PubMed PubMed Central Google Scholar 

Whelan, S., Irsarri, I. & Burki, F. PREQUAL: Detect non-homologous characters in a set of unaligned homologous sequences. Bioinformatics 34, 3929–3930 (2018).

Ali, RH, Bogusz, M. & Whelan, S. Identify high-confidence homology clusters in multiple sequence alignments. Mole. biology. evolution. 36, 2340–2351 (2019).

CAS PubMed PubMed Central Google Scholar 

Wang, H.-C., Minh, BQ, Susko, E. & Roger, AJ used posterior average site frequency distribution to model site heterogeneity to accelerate accurate phylogenetic estimation. system. biology. 67, 216–235 (2018).

Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: Use infinite hybrid profiles for phylogenetic reconstruction in a parallel environment. system. biology. 62, 611–615 (2013).

Susko, E., Field, C., Blouin, C. & Roger, AJ Estimation of cross-site distribution rate in phylogenetic substitution model. system. biology. 52, 594–603 (2003).

Viklund, J., Ettema, TJG & Andersson, SGE. Independent genome reduction and phylogenetic reclassification of marine SAR11 clade. Mole. biology. evolution. 29, 599–615 (2012).

Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: Reconstruction of polynomial temporal species tree from partially resolved gene tree. Bmc biological information. 19, 153 (2018).

Priyam, A. etc. Sequenceserver: A modern graphical user interface for customizing BLAST databases. Mole. biology. evolution. 36, 2922–2924 (2019).

CAS PubMed PubMed Central Google Scholar 

Kim, J.-J. Wait. GetOrganelle: A fast and versatile toolkit for accurately assembling organelle genomes from scratch. Genomic biology. 21, 241 (2020).

PubMed PubMed Central Google Scholar 

Schön, ME, Eme, L. & Ettema, TJG PhyloMagnet: Use gene-centric phylogeny to quickly and accurately screen short-read tuples data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz799 (2019).

Hewson, DH et al. Fast and simple protein alignment of orthologous gene families from microbiome sequencing reads guides assembly. Microbiome 5, 11 (2017).

PubMed PubMed Central Google Scholar 

Emms, DM & Kelly, S. OrthoFinder: The phylogenetic direct inference of comparative genomics. Genomic biology. 20, 238 (2019).

PubMed PubMed Central Google Scholar 

Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: A versatile open source tool for metagenomics. Peerj 4, e2584 (2016).

PubMed PubMed Central Google Scholar 

This work was supported by a grant from the Life Science Laboratory provided by FB and a scholarship from Carl Tryggers Stiftelse to VVZ (PI: FB). TJGE thanks the European Research Council (ERC consolidator grant 817834); the Netherlands Research Council (NWO-VICI grant VI.C.192.016); Moore-Simons Eukaryotic Origin Project (Simons Foundation 735925LPI, https://doi.org/10.46714/ 735925LPI); and Marie Skłodowska-Curie ITN project SINGEK (H2020-MSCA-ITN-2015-675752), an investigator from the Gordon and Betty Moore Foundation (https://doi.org/10.37807), which provided funding for MESPJK and VM Grant grant/GBMF9201). The Pacific Ocean work was supported by GBMF3788, the AZW sampling at the Baltic LMO station was supported by the Swedish Research Council VR, the marine strategic research project EcoChange to JP sequencing was performed by the SNP&SEQ technology platform of Uppsala, and the Swedish National Genomics Infrastructure (NGI) ) And part of the life science laboratory. The SNP&SEQ platform is also supported by the Swedish Research Council and the Knut and Alice Wallenberg Foundation. Cell sorting and whole genome amplification are performed in SciLifeLab's Microbial Single Cell Genomics Facility (MSCG). Calculate on the resources provided by the Swedish National Computing Infrastructure (SNIC) of SNIC 2019/3-305, SNIC 2020/15-58, SNIC 2021/5-50, Uppsala Multidisciplinary Center for Advanced Computing Sciences (UPPMAX) , Uppstore2018069. Finally, we thank Eunsoo Kim and Sally D. Warring for sharing peptide models from Palpitomonas bilix and Roombia truncata.

Open access funding provided by Uppsala University.

Current address: Department of Invertebrates, Faculty of Biology, St. Petersburg State University, St. Petersburg, Russia

Camille Poirier & Valsa Mathur

Current address: Department of Zoology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK

Current address: Institute of Biodiversity and Ecosystem Dynamics, University of Amsterdam, Netherlands

Jürgen FH Strassert

Current address: Department of Ecosystem Research, Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Department of Organic Biology, Systems Biology Project, Uppsala University, Uppsala, Sweden

Max E. Schön, Vasily V. Zlatogursky, Jürgen FH Strassert and Fabien Burki

Department of Cell and Molecular Biology, Molecular Evolution Project, Uppsala University, Uppsala, Sweden

Biodesign Center for Evolutionary Mechanisms, College of Life Sciences, Arizona State University, Tempe, Arizona, USA

Rohan P. Singh & Jeremy G. Wideman

Marine Ecosystem Biology, RD3, GEOMAR Helmholtz Marine Research Center Kiel, Kiel, Germany

Camille Poirier & Alexandra Z. Worden

Monterey Bay Aquarium Research Institute, Moss Landing, California, USA

Camille Poirier, Susanne Wilken and Alexandra Z. Worden

Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada

Varsha Mathur & Patrick J. Keeling

Center for Ecology and Evolution of Microbial Model Systems-EEMIS, Linnaeus University, Kalmar, Sweden

Microbiology Laboratory, Wageningen University, Wageningen 6708 WE, Netherlands

Life Science Laboratory, Uppsala University, Uppsala, Sweden

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

FB and JGW conceived this research. For the Baltic Sea samples, VVZ and JP carried out sampling and cell sorting; VVZ and JFHS carried out genome amplification and sequencing preparations. For Pacific samples, CP, SW, and AZW conceived, developed, and implemented classification protocols; sampling, cell sorting, and sequencing were performed; CP and AZW performed initial sequence analysis and phylogeny. MES performs the assembly and phylogenetic analysis of SAGs and Co-SAGs under the supervision of TJGE and FB, and searches for plastid evidence and gene transfer with the help of the mitochondrial genome assembled by VM, PJKRPS and JGW. MES, FB and JGW drafted manuscripts. TJGE, PKJ, AZW, JP, VVZ and JFHS edited the manuscript. All authors have read and approved the final version.

The author declares no competing interests.

Peer review information Nature Communications thanks David Moreira and Hwan Su Yoon for their contributions to the peer review of this work. Peer review reports are available.

The publisher states that Springer Nature remains neutral on the jurisdiction claims of published maps and agency affiliates.

Open Access This article has been licensed under the Creative Commons Attribution 4.0 International License Agreement, which permits use, sharing, adaptation, distribution and reproduction in any media or format, as long as you appropriately indicate the original author and source, and provide a link to the Creative Commons license , And indicate whether any changes have been made. The images or other third-party materials in this article are included in the article’s Creative Commons license, unless otherwise stated in the material’s credit line. If the article’s Creative Commons license does not include the material, and your intended use is not permitted by laws and regulations or exceeds the permitted use, you need to obtain permission directly from the copyright owner. To view a copy of this license, please visit http://creativecommons.org/licenses/by/4.0/.

Schön, ME, Zlatogursky, VV, Singh, RP, etc. Single-cell genomics revealed that dermal algae lacking plastids are close relatives of red algae. Nat Commun 12, 6651 (2021). https://doi.org/10.1038/s41467-021-26918-0

DOI: https://doi.org/10.1038/s41467-021-26918-0

Anyone you share the following link with can read this content:

Sorry, there is currently no shareable link in this article.

Provided by Springer Nature SharedIt content sharing program

By submitting a comment, you agree to abide by our terms and community guidelines. If you find content that is abusive or does not comply with our terms or guidelines, please mark it as inappropriate.

Nature Communications (Nat Commun) ISSN 2041-1723 (online)